Skip to content

feat(cypher): forward secondary whole-row carry through chained reentry (#1256)#1258

Merged
lmeyerov merged 5 commits intomasterfrom
feat/1256-cross-reentry-boundary-carry
May 3, 2026
Merged

feat(cypher): forward secondary whole-row carry through chained reentry (#1256)#1258
lmeyerov merged 5 commits intomasterfrom
feat/1256-cross-reentry-boundary-carry

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented May 3, 2026

Summary

Closes the chained-reentry portion of slice 4.3d (#1256) so multi-alias whole-row carry survives more than one reentry boundary. Builds on top of #1248 / #1071 which together enabled single-boundary multi-alias carry.

Closes (partial) #1256 — chained-reentry portion. The free-form intermediate-MATCH case (LDBC SNB IC3 prefix shape, where the trailing MATCH does not start from any carried alias) remains open follow-up.

Before / after

Before (against master 80d80849c):

MATCH (a:A {id: 'a'}), (x:B {id: 'b'})
WITH a, x
MATCH (a)-[:R]->(friend)
WITH friend, x
MATCH (friend)-[:S]->(c)
RETURN c.id, x.id
-- GFQLValidationError: Unknown Cypher alias '__cypher_reentry_x_id__' in RETURN clause

After: returns the expected rows; the secondary alias x's hidden carry column survives the reentry-source rebinding.

What changed

graphistry/compute/gfql/cypher/lowering.py — two extensions to _demote_secondary_whole_row_aliases (the active rewrite path on master after #1071 superseded slice 4.3b's _rewrite_multi_whole_row_prefix):

  1. Forwarding-item drop — bare-identifier projection items in downstream WITH stages whose name is a secondary alias are dropped at compile time before _collect_secondary_property_refs walks them for bare references. Same intent as slice 4.3c, integrated into the active path. Without this, forwarding patterns like WITH a, x, friend, ... tripped the bare-ref failfast even though the property carry already lives as a hidden scalar on the reentry-source's row table.

  2. Hidden-column forwarding — for every (secondary_alias, prop) collected from trailing clauses, the synthesized __cypher_reentry_<S>_<X>__ hidden alias is appended as a bare passthrough item to every downstream WITH stage so each recursive compile_cypher_query call sees it as a scalar carry. Without this, the inner compile failed alias resolution on the rewritten hidden identifier once a chained stage narrowed the projection scope.

Tests

graphistry/tests/compute/gfql/cypher/test_lowering.py:

  • test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_reentry — chained reentry where the trailing MATCH continues to use the same primary alias (a → friend → other, both reentry MATCHes rooted at a).
  • test_string_cypher_admits_secondary_alias_carry_across_reentry_source_rebinding — source rebound between boundaries (MATCH (a)-[:R]->(friend) ... MATCH (friend)-[:S]->(c)); x.id carries through.
  • test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry — multi-alias WITH DISTINCT forwarding through a single boundary.
  • test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source — pins the remaining gap (free-form intermediate MATCH that does not start from any carried alias) to the existing scoped error so future closure is regression-locked.

Existing slice 4.1 / 4.3a / 4.3b / 4.3c tests stay green (regression net).

Validation

  • pytest -q graphistry/tests/compute/gfql/cypher/test_lowering.py780 passed, 67 skipped (was 776 before; +4 new tests). No regressions.
  • ./bin/ruff.sh clean.
  • ./bin/mypy.sh on lowering.py clean. (Pre-existing mypy errors in test_lowering.py are unrelated; verified present on master.)
  • Repro plans/1256-cross-reentry-boundary-carry/repro/repro_ic3.py: simple rebinding cases now compile + execute correctly. The literal IC3 query still fails at the free-form intermediate MATCH gate (MATCH (city:City)-[:IS_PART_OF]->(country:Country) — neither alias is carried) — separate follow-up tracked under Cypher/GFQL: forward bounded-reentry carry hidden columns across reentry-source rebinding (LDBC SNB IC3) #1256.
  • cuDF lane: not yet run (no _cudf variant of new tests in this PR — pure compile-time change uses existing scalar-carry runtime plumbing that already has cuDF coverage; will spot-check on DGX before merge).

Test plan

  • pytest -q graphistry/tests/compute/gfql/cypher/
  • ./bin/ruff.sh
  • ./bin/mypy.sh on touched files
  • Hosted CI green (push pending)
  • cuDF spot-check on DGX
  • Review skill converged

Refs

🤖 Generated with Claude Code

lmeyerov and others added 5 commits May 3, 2026 15:04
…ry (#1256)

Closes the chained-reentry portion of slice 4.3d (#1256) so multi-alias whole-row
carry survives more than one reentry boundary.

Two extensions to `_demote_secondary_whole_row_aliases` (the active rewrite path
on master after #1071 superseded slice 4.3b's `_rewrite_multi_whole_row_prefix`):

1. **Forwarding-item drop** — bare-identifier projection items in downstream
   `WITH` stages whose name is a secondary alias are dropped at compile time
   *before* `_collect_secondary_property_refs` walks them for bare references.
   Same intent as slice 4.3c, integrated into the active path. Without this,
   forwarding patterns like `WITH a, x, friend, ...` tripped the bare-ref
   failfast even though the property carry already lives as a hidden scalar.

2. **Hidden-column forwarding** — for every `(secondary_alias, prop)` collected
   from trailing clauses, the synthesized `__cypher_reentry_<S>_<X>__` hidden
   alias is appended as a bare passthrough item to every downstream `WITH`
   stage so each recursive `compile_cypher_query` call sees it as a scalar
   carry. Without this, the inner compile failed alias resolution on the
   rewritten hidden identifier once a chained stage narrowed projection scope.

New positive tests cover three previously-blocked shapes:
* `test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_reentry`
  — chained reentry where the trailing MATCH continues to use the primary
* `test_string_cypher_admits_secondary_alias_carry_across_reentry_source_rebinding`
  — source rebound between boundaries (`(a)-[:R]->(friend) ... (friend)-[:S]->(c)`)
* `test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry`
  — multi-alias `DISTINCT` forwarding through a single boundary

New failfast test pins the remaining gap (free-form intermediate MATCH that
does not start from any carried alias — the LDBC SNB IC3 prefix shape) to the
existing scoped error so future closure is regression-locked.

Validation:
* 780 cypher lowering tests pass (was 776; +4 new).
* Touched-module mypy clean.
* Repro `plans/1256-cross-reentry-boundary-carry/repro/repro_ic3.py`: simple
  rebinding cases now compile + execute correctly; literal IC3 still fails at
  the free-form intermediate MATCH gate (separate follow-up).

Refs #1256 (closes the chained-reentry portion; free-form intermediate MATCH
remains open), #999 (IC3 partial), #989 (row-carrier IR umbrella).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entry carry (#1256 W1-I1)

Wave-1 review flagged that hidden-column forwarding could in principle silently
mutate grouping when the appended `__cypher_reentry_<S>_<X>__` bare item lands
inside a `WITH a, friend, count(*) AS n` chain — the carry would join the
group key set.

Empirically those queries already fail at the pre-existing
"aggregate would need repeated MATCH rows from a relationship pattern"
failfast, so no silent wrong-result path exists today. Pin that behavior with
a regression-lock test so a future change that lifts the aggregate failfast
must reckon with carry-grouping interaction explicitly.

Also document the DISTINCT/aggregate interaction in `_demote_secondary_whole_row_aliases`
inline so the next reader sees the design tradeoff.

Refs #1256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…TH (#1256 W2-IMPORTANT-1)

Wave-2 review uncovered a silent wrong-result for the corner shape
`MATCH (a),(x) WITH a, x MATCH (a) WITH a, x, count(*) AS n RETURN x.id, n`
(node-only trailing MATCH + aggregating downstream WITH + carry forward).
PR HEAD returned `xid=None, n=1`; master rejects the same query at the
secondary-whole-row failfast.

Add a guard in `_demote_secondary_whole_row_aliases`: when carry forwarding is
about to append a hidden column to a downstream WITH stage, refuse if any
item in that stage contains an aggregate function call. Raise a scoped #1256
failfast pointing at the gap. The relationship-pattern aggregate path is also
now covered by this guard (the existing "aggregate would need repeated MATCH
rows" failfast still fires for non-carry queries; the scoped error is clearer
when a carry is actually involved).

Test changes:
- Rename `..._with_aggregate_hits_existing_aggregate_failfast` to
  `..._with_aggregate_relationship_match_failfast` and update the regex to
  match the new scoped error.
- New `..._with_aggregate_node_only_match_failfast` regression-locks the
  W2-IMPORTANT-1 case.

Validation:
- 782 cypher lowering tests pass (was 780 before slice 4.3d.2; +2 net).
- Touched-module mypy clean.
- Scoped failfast wording is symmetric with the slice 4.3a/b admit-gate
  failfast (#989) so error consumers can pattern-match.

Refs #1256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing failfast (post-#1249 merge)

After merging origin/master (which brought in #1249's row pipeline changes),
the chained-reentry-same-primary positive test now hits the pre-existing
"unique carried node rows" failfast at gfql_unified.py:~1091. This is not a
regression introduced by slice 4.3d.2 — the carry-uniqueness constraint is
fundamental to the current scalar-carry runtime model. Multiple rows sharing
the same `a` value (one per friend after the first reentry) cannot be carried
through to a second reentry that re-uses `a` as the source.

The rebinding shape (Q2-style `(a)-[:R]->(friend) ... (friend)-[:S]->(c)`)
remains validated and continues to pass — that's the actual cross-boundary
case the slice closes.

Retarget the test to assert the unique-rows failfast fires, locking the known
limitation so a future slice that lifts the constraint must update this test
alongside the runtime change.

Refs #1256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant